MySQL Index Overviews: Hash Indexes

Comments 0

Share to social media

Welcome back to the MySQL indexing series! We’ve already covered the nuances surrounding indexes in MySQL as well as walked you through B-tree indexes.

MySQL, however, has many other index types it can offer for your use case, and one of those indexes are hash indexes, too.

What are Hash Indexes?

In MySQL, hash indexes are indexes that are used in queries that use the equality operators like = or <=> (which is the MySQL NULL safe equality operator, equivalent to the SQL Standard IS NOT DISTINCT FROM). Hash indexes are not used in other situations, so they can be useful for things like random or generated PRIMARY KEY values where you are looking up single rows and not needing to order rows or searching for a range of values.

As per the documentation, MySQL only supports the use of hash indexes for the MEMORY storage engine, and these kinds of indexes are known to facilitate blazing-fast single-row searches through data residing in memory. Hash indexes may start to become an option when index structures can save your disk in return for occupying computational power and the memory available on your server.

Note: Such indexes can also be defined within other storage engines (I’ll get to that in a second); however, in use cases not revolving around the MEMORY storage engine, hash indexes are likely to be turned into B-tree indexes.

The Subtleties of Hash Indexes

Hash indexes are not as widely implemented as B-Tree indexes, and as mentioned, part of that is their narrow use case. But even though they can be useful, there are a few concerns that you need to have to be aware of:

For the hash table, one needs to know the number of rows likely to be within the table. That helps with determining whether the amount of memory available within our server will be sufficient to store hash indexes on our data.

Hash indexes may have to be rebuilt from time to time for them to be useful. Rebuilding hash indexes may become necessary when your data changes considerably—if you have larger or smaller tables than expected, are not shutting down your server, and want your index to shrink, hash indexes may have to be rebuilt.

That’s where we come to the upsides of storing such indexes in the memory of your server because as our data grows, that could add an overhead on our disk I/O, but because hash indexes primarily work with data residing in memory, the rebuilding of such index types is likely to be significantly faster.

Hash indexes are used for in-memory tables. Since the only way to acquire and work with in-memory tables within MySQL and its counterparts is the usage of MEMORY storage engine, the use cases of hash indexes will also involve MEMORY-based tables within MySQL.

Aside from that, it would be a shame not to note that disk-based storage engines such as InnoDB provide support for a type of hash index called Adaptive Hash Indexes. More on this later in the article.

Hash Indexes in MariaDB

Now that you’ve familiarized yourself with hash indexes, it’s time to dig into what makes them tick. I’ll use a demo table to demonstrate hash indexes in action. Its structure will be defined like so (note – here, we don’t denote MEMORY as our storage engine to demonstrate how MariaDB “thinks in InnoDB” internally when hash indexes are defined):

After we have the InnoDB table up and running, we will also drop a hash-based index on top of it using the CREATE INDEX query:

When we combine the CREATE INDEX clause with the CREATE TABLE query in phpMyAdmin, we have the following result:

D:\Misc\Redgate\Blogs\August 2024\Hash Indexes\Images\Hash Indexes in MariaDB - Creating a Table.png


Image 1 – Creating Hash Indexes when Creating a Table in MariaDB

After we execute the code and then inspect the structure of our table once, we can see that the hash index is indeed a BTREE index under the hood – that happened because the default table storage engine definition for all tables that don’t have the storage engine defined is InnoDB:

D:\Misc\Redgate\Blogs\August 2024\Hash Indexes\Images\Hash Index = BTREE Index.png


Image 2 – The Hash Index you expected to create is actually created as a BTREE Index

Let’s now try doing the same thing with a table running the MEMORY storage engine instead (uses the same code, with one small change to the CREATE TABLE statement to denote the usage of the MEMORY storage engine):

Now, execute the create statements again, and you will see a different output:

D:\Misc\Redgate\Blogs\August 2024\Hash Indexes\Images\Hash Indexes on a MEMORY Table.png


Image 3 – Defining a Hash Index on the MEMORY Storage Engine

If we inspect the memory-based table structure now, we will see that all of our indexes are hash indexes instead:

D:\Misc\Redgate\Blogs\August 2024\Hash Indexes\Images\Indexes on a MEMORY Table.png


Image 4 – Hash Indexes in MariaDB

Even the primary key index is a hash index! Interesting, right?

That’s because our table is based on the MEMORY storage engine, and as all of its data is stored in memory, MySQL doesn’t have to default to using B-tree indexes and can suggest (and choose) the faster option – a hash-based index – instead.

The same is true for clustered indexes – define an AUTO_INCREMENT on an ID column on a memory-based storage engine, and you will have a hash-based clustered index, too – it will be hyper-fast to its clustered nature and hash-based structure. Cool, right?

D:\Misc\Redgate\Blogs\2024\August 2024\Hash Indexes\Images\AutoIncrement on MEMORY Table - Query.png

Image 5 – Creating an AUTO_INCREMENT Column Using the MEMORY Storage Engine

D:\Misc\Redgate\Blogs\2024\August 2024\Hash Indexes\Images\AutoIncrement on MEMORY Table - Results.png


Image 6 – the AUTO_INCREMENT Column on a MEMORY Table is a HASH Index

Now, we will insert some data into our table and review how such indexes work.

Hash Indexes in Action

First, let’s insert some data into our in-memory-based table – for that, we can make use of ordinary INSERT or LOAD DATA INFILE statements without inventing anything revolutionary either (you can see the details for loading the data in the Appendix):

D:\Misc\Redgate\Blogs\August 2024\Hash Indexes\Images\LOAD DATA INFILE on MEMORY table.png


Image 7 – LOAD DATA INFILE Into a MEMORY Table

Everything works as usual – COUNT(*) queries too:

D:\Misc\Redgate\Blogs\August 2024\Hash Indexes\Images\SELECT COUNT.png


Image 8 – COUNT(*) On a MEMORY Table

One thing I would direct your attention to is the time taken by MariaDB to complete each query – the first query was completed in 0.01 sec., and the second one in 0.002 sec., and that’s not because we heavily optimized our database, but because our data resides in memory and can be accessed much faster. This can be confirmed once we run some basic tests on an InnoDB table and one running the MEMORY storage engine.

First, note that the same table (the structure is the same, too) will occupy space in memory the same way it occupies space on the disk, meaning that, likely, you won’t be able to store as much data in memory as you would be able to store on the disk. The MEMORY storage engine has some more caveats, too:

  1. MEMORY tables are similar to MyISAM tables in that they perform full-table locks when performing INSERT, UPDATE, and DELETE operations. X amount of these operations means X amount of full table locks per second.
  2. MEMORY tables still incur disk I/O: the MySQL daemon will contact the .frm file belonging to the MEMORY table to verify that the table exists before parsing an SQL query. In other words, if you run many SQL queries against a MEMORY table, you can notice some disk I/O.

These caveats are discussed in a StackOverflow thread by a former Pythian employee, and they do have an impact on how you work with your database and the operations performed within. One of those operations may be a comparison of the storage engines (InnoDB vs. MEMORY):

D:\Misc\Redgate\Blogs\2024\August 2024\Hash Indexes\Images\InnoDB vs. Memory - InnoDB Query Part 2.png


Image 9 – InnoDB Table Example

D:\Misc\Redgate\Blogs\2024\August 2024\Hash Indexes\Images\InnoDB vs. Memory - MEMORY Query Part 2.png


Image 10 – MEMORY Table Example

Here, the difference may be negligible because we only have 5,000 rows we run our tests on; once the row count in our tables increases, the results will become more and more visible. However, at the same time, the amount of space taken on the memory will increase accordingly – chances are that you don’t have more than 64GB of RAM available, so tread that line carefully.

To understand these results, consider this illustration of hash indexes:

Image 11 – Basic Illustration of Hash Indexes in MySQL

Since hash indexes use a hash function to map keys to an index position (see illustration above), such an algorithm becomes very useful when searching for specific data. Let’s try hash indexes out – we will do that with 7 queries, which will search for different things using the equality, less than, more than, or all of those operators. We will:

  1. Search for an ID value using the “more than” operator. (>)
  2. Search for an ID value using the “equal or more than” operator. (>=)
  3. Search for a string value with an exact match. (=)
  4. Search for two exact string values using an OR operator.
  5. Utilize a LIKE query that uses an index (we won’t have a wildcard in front of the query.)
  6. Search for an ID value using the NULL safe equality operator. (<=>)
  7. Run a search using the “less than or equal to” operator. (<=)

Our results will look like so:

D:\Misc\Redgate\Blogs\August 2024\Hash Indexes\Images\EXPLAIN SELECT.png


Image 12 – EXPLAINing HASH Indexes in MariaDB

D:\Misc\Redgate\Blogs\August 2024\Hash Indexes\Images\EXPLAIN SELECT 2.png


Image 13 – EXPLAINing HASH Indexes in MariaDB Part 2

Attentive readers will notice that the internal functionality of hash indexes is somewhat different than that of B-trees. From the examples above, we can notice that, unlike B-tree indexes that can be used together with the =, <, >=, <, <= or BETWEEN operators, hash-based indexes can only be considered for use if we search for an exact match together with the = or <=> operators.

In other words, hash indexes are useless if we are searching for more than a single value because our database won’t even consider such an index for use in the first place. That happens because hash indexes are used to improve data access speed when employing them in a single-value lookup operation and not anything else – as the core premise of such indexes is data residing in memory, they don’t have the luxury of assisting searches for a range of values or anything in that realm, and that’s why queries using clauses like BETWEEN, LIKE, >, or < are out of the question. They won’t make use of a hash index, no matter how your query is written.

To find a row using a hash index, your database will calculate the hash value of the value you’re searching for, then perform a search of the buckets of values that hashed to that same value. If it finds a result in the bucket, the result is returned.

The examples above support all hash index characteristics defined in the documentation of MySQL.

MySQL’s Adaptive Hash Index

Many users of MySQL will also be aware of an index called the adaptive hash index. The InnoDB adaptive hash index acts as a feature assisting in-memory hash lookups for indexing. InnoDB builds the adaptive hash index at runtime, and such an index adapts to your workload in the sense that if it “senses” that a particular value is being looked up more often, it creates an entry in itself either with the full value or its prefix, depending on your use case. Then, if such a value is searched for once again, MySQL will then use the Adaptive Hash Index instead of a B-tree index.

Such an index can be enabled by fiddling with the innodb_adaptive_hash_index variable (available values include 0 and 1), and it can be used to “add a layer” between MySQL and the InnoDB buffer pool. Then, the entire structure of your database would look like this:

Image 14 – The Adaptive Hash Index in Between the InnoDB Buffer Pool and the Disk

In MySQL, MariaDB, and Percona Server, the adaptive hash index is enabled by default and will be built automatically if InnoDB deems it to be necessary:

D:\Misc\Redgate\Blogs\August 2024\Hash Indexes\Images\SHOW VARIABLES innodb adaptive.png


Image 15 – An Adaptive Hash Index is Enabled by Default

Coming back to hash indexes, though, understand that a hash index works by taking the keys of the indexed values and “hashing them” into output values. As such, and as I’ve already mentioned numerous times in this blog post alone, hash indexes should only be used if we’re 100% certain that we will only ever need a column to be looked up with equality operators and not with anything else.

To recap, consider a hash index when:

  • The primary types of SQL queries you find yourself running are related to equality comparisons. If you’re building an app on a key-value store, hash indexes can help you.
  • You don’t have a necessity to search for a range of values. In other words, if your query is using the “less than” or “more than” operators to assist your use case, hash indexes won’t work.
  • Your use case involves storing data in memory (for that, consider using the MEMORY storage engine or InnoDB with the Adaptive Hash Index if your use case necessitates InnoDB.)
  • You carefully consider the upsides and downsides of the multiple types of indexes available in MySQL, and the usage of a B-tree index doesn’t satisfy your use case.
  • Your queries don’t include an ORDER BY clause (if they do, a hash index wouldn’t even be considered for use.)

Finally, remember that hash indexes are only as useful as the memory within your server—exceed that limit, and your data is cactus. Don’t store too much data in memory: such indexes have drawbacks, and those drawbacks can burn your database to ashes if not considered in full.

Summary

Hash indexes are an exclusive type of index that makes searching through in-memory values as quick as possible by avoiding disk I/O. The downside of this kind of index type in MySQL is that it can only be directly with the MEMORY storage engine (or in InnoDB with an Adaptive Hash Index in some cases), that the only kinds of data that can be indexed is in-memory data due to the storage engine in use, and that SQL queries will only make use of a hash index when an exact equality operator is in use.

Nonetheless, these kinds of indexes have their use case: it’s up to you to decide whether the use case is something that should be used in your specific scenario or not. No matter which kind of index type you elect to use, keep in mind that each one of them has upsides and downsides unique to themselves: choose the one most useful for your specific use case and until next time.

Appendix – Table Structure & Data

The table structure and data in this blog were generated by a demo data generator Mockaroo. The data used in examples can be found at https://ghostbin.cloud/v6svw.

The table structure in question is as follows:

Article tags

Load comments

About the author

Lukas Vileikis

See Profile

Lukas Vileikis is an ethical hacker and a frequent conference speaker. Since 2014, Lukas has found and responsibly disclosed security flaws in some of the most visited websites in Lithuania. He runs one of the biggest & fastest data breach search engines in the world - BreachDirectory.com, frequently speaks at conferences and blogs in multiple places including his blog over at lukasvileikis.com.